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Abstract 

We study a non linear regression model with functional data as inputs and scalar response. We 
propose a pointwise estimate of the regression function that maps a Hilbert space onto the real line 
by a local linear method. We provide the asymptotic mean square error. Computations involve a 
linear inverse problem as well as a representation of the small ball probability of the data and are 
based on recent advances in this area. The rate of convergence of our estimate outperforms those 
already obtained in the literature on this model. 
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1 Introduction 
1.1 The data and the model 



f-H I In probability theory, random functions have been for quite a long time under the lights. The tremendous 

■ advances in computer science and the opportunity to deal with data collected at a high frequency make 

(-H I it now possible for statisticians to study models for " high-dimensional data" . As a consequence many 

of them focused their attention on models for functional data i.e. models that are suited for curves, for 
instance spectral curves, growth curves or interest rates curves... 

Even if seminal articles on functional data analysis date back to more than 20 years (see Dauxois, 
Pousse and Romain (1982)), this area is currently going through a deep bustle. The book by Ramsay and 
Silverman (1997) initiated a series of monographs on the subject : Bosq (2000), Ramsay and Silverman 
> ; again (2002), Ferraty and Vieu (2006). 

Functional Data Analysis has drawn much attention and many of the classical multivariate data 
^ analysis techniques such as Principal Component Analysis, Correlation Analysis, ANOVA, Linear Dis- 

\^ . crimination were generalized to curves. But statistical inference gave and gives birth to many papers. 

I Linear regression and autoregression for instance rise an interesting inverse problem (see Yao, Miiller, 

^ ' Wang (2005), Miiller, StadtmiiUer (2005), Cai, Hall (2006), Cardot Mas, Sarda (2007), Mas (2007a)). 

I Even more recently the case of nonparametric regression was introduced in Ferraty, Vieu (2003) then stud- 

i icd in Masry (2005) and Ferraty, Mas, Vieu (2007) : a Nadaraya- Watson type estimator was proposed. 

This model is the starting point of our article, 
■i^ . In the sequel we will consider a sample drawn from random elements with values in an infinite 

' dimensional vector space : Xi, A"„. Here Xi = Xi (•) is a random function defined, say, on a compact 

^ . interval of the real line [0,T]. We will also assume once and for all that the X^'s take their values in 

a separable Hilbert space denoted H . This Hilbert space is endowed with an inner product (•, •) from 
which is derived the norm ||-||. Such techniques as wavalets or splines yield reconstructed curves in the 



(Hilbert) Sobolev spaces 



I^™'2 {/e L^QQ^j.]) . j(m) ^ L'^[[Q^T])^ 



where /(™) denotes the m}^ derivative of /. Further information on Sobolev spaces may be found in 
Adams and Fournier (2003). However for the sake of generality we will consider H as the sequence space 
I2 and any vector x will be classically decomposed in a basis, say (ei)jgpj so that : 

+00 
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We are given a sample (j/i, -'^i)i<i<„ G (K x H) oi independent and identically distributed data. 
Let TO be the regression function that maps H onto R. 

The model is a classical non parametric regression model : 

■tji = m{Xi) + £i I <i <n. (1) 

or, with other symbols : 

m(xo) =E{y\X = xq) 

where y and X stand for random variables with the same distributions as yi and Xi . The noise e follows 
both assumptions : 

E{e\X) = 0, 

and CTg does not depend on X. The issue of the expectation of X (should the X's be centered or not ?) 
is not crucial ; it will be addressed later on but for simplicity we assume that E (X) = 0. Let xq be a 
fixed and known point in H. We are aiming at estimating m (xq). 

In finite dimension, and more precisely when Xi is a real- valued random variable, m{xo) may be 
estimated by considering an affine approximation of m around xq : 

TO (x) w TO (a;o) + to' {xq) {x — xq) 

when X is close to xq . This approach leads us to finding a solution to the following minimization problem : 

min V(y,_a-6(xo-X,)fi^ f^^^^) (2) 
i=l ^ ' 

which is nothing but a weighted mean square program (weighted by the K ((xq — Xi) //i)'s). Here K is 
a kernel : a measurable positive function such that J K = 1 and h = hn the bandwidth indexed by the 
sample size. Then a*, one of the two solutions of the display above is the estimate of m (xq) . As a special 
case taking 6 = comes down to the classical Nadaraya- Watson estimator. We refer the interested reader 
to Nadaraya (1964) and Fan (1993) about this topic. The generalization of ^ to higher orders (namely 
approximating locally to by a polynomial) gives birth to the local polynomial estimate of to (xq). We refer 
for instance to Chen (2003) for a recent article. Convergence in probability and asymptotic normality 
of the kernel polynomial estimators for a density function, variable bandwith and local linear regression 
smoothers, were studied by Fan and Gijbels (1992). 

When X belongs to a Hilbert space, the principle remains the same. The function m is now approxi- 
mated by : 

TO (x) « TO (Xo) + {ip (xq) ,X- Xq) 

where (p (xq) e _ff is the first order derivative of to at xq (the gradient in fact) and the local linear 
estimate of to at xq stems from the following adapted weighted least square program : 

min V(y,-a-(^,X,-xo))'if f^^^^V (3) 
am,>fieH \ h J 

At last the estimate to.„ (xq) of m (xq) is a*, solution of Q. We refer to Barrientos-Marin, Ferraty, Vieu 
(2007) for another approach. These authors consider a program simplified from the one above (they 
replace the functional paramater ip hy a scalar one) . But display ([3]) seems to be a true generalization of 
(12) since ip like b estimates the derivative of to. 

Remark 1 Investigating higher orders approximations turns out to be especially uneasy in this functional 
setting. For instance a local quadratic estimate involves the second order derivative of m (the Hessian 
operator) which is a symmetric positive operator on H . The local linear method appears as a good trade-off 
between the complexity of the method and its accuracy. 

However solving ^ is not easy in this framework. The aim of the present work is to provide a bound 
for the mean square error of the estimate a* of to (xq) that is : 

E [to„ (xq) - TO {xo)f 

through a classical bias-variance decomposition. The paper is organized as follows : the two next sub- 
sections are devoted to pointing out the two main problems that arise from the model and that are 
symptomatic of the functional framework. The needed assumptions are commented, then the central 
result is given before the last section which contains all the mathematical derivations. 
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i.z ±ne estimate and tne ill-posed problem 

In order to go ahead we need to define two linear operators from H to H (the first is non-random, the 
second is random, based on the sample). The usual sup- norm for operator T will be denoted : 

nu = sup iiTxii 

xeBi 

where Bi stands for the closed unit ball of H. From now on the reader should be familiar with basic 
notions related to the theory of bounded and unbounded linear operators on Hilbert space. A wide 
literature exists on this stopic which is central in the mathematical science. Some of our references are 
Weidman (1980), Akhiezer, Glazman (1981), Dunford, Schwartz (1988), Gohberg, Goldberg, Kaashoek 
(1991) amongst many others. 

Definition 2 The theoretical local covariance operator of X at xq G H associated with the kernel K is 
defined by : 

\Xi - xo\ 



rK = E[K[ " \ ) {{Xi - x„) ® (Xi - xo)) 
and its empirical counterpart is : 



fc=l 



Remark 3 In fact neither Tk rior Tn.K o-fs truly covariance operators since the involved random ele- 
ments are not centered, they could also be named "local second order moments operators". Also note that 
Tk depends on h and h will depend on the sample size n. So the reader must keep in mind that the index 
n was dropped in the notation Tk- 

It is important to give some basic properties of these operators. We listed those which will be useful 
in the sequel : 

• Tk and Tn^K are self -adjoint and trace-class hence compact whenever K has compact support. 

• Both operators tend to zero when h does. Indeed : 



<E(jr( "^^-^°" ) \\X,-xof'^<Ch' 



as will be shown in the section devoted to mathematical derivations. The operator Tn,K also tends 
to as a consequence of the strong law of large numbers for sequences of independent Banach 
valued random variables (see Ledoux-Talagrand (1991)). 

• When Tk is one to one its inverse exists. Sufficient conditions on K and on X for Tk to be injective 
are not difficult to find but this interesting issue is slightly above the scope of this work. Then Tj} 
is an unbounded linear operator acting from a dense domain of H onto H. It should be stressed 
that is continuous at no point of its domain (it is nowhere continuous). 

Imagine that the distribution of the data (namely of the couple {y, X)) is known. We could consider 
to solve, instead of ^ : 



mm 1 



(,-a-(^,X-.o))^i^^"''~"°" 



h 



(5) 



The first stumbling stone appears within the next Proposition. 



Proposition 4 Even when the distribution of the data is known, the solution a^f^ of the "theoretical" 
program 0) exists only when Tk is one to one. Then a^f^ is the solution of a linear inverse problem 
which involves the unbounded inverse ( whenever it exists ) of T k ■ 

^ E {yK) - (r^^^E {yKZ) , E {KZ)) 

E(is:) - (r^.^E(ifz) ,E(is:z)) ^' 
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where, tor the sake oi shortness, we denoted 



Z{xo) = Z = X-xo and K ^ K {\\X - xq\\ /h) . 

The problem gets deeper when we go back to the original and empirical program ([3]) It turns out that 
the solution cannot be explicitely written since r„^x (which replaces now Tk) has no inverse because 
it has a finite rank. Its rank is clearly bounded by n. In other words the inverse does not exist. 

A classical remedy consists in replacing by a bounded operator ^ depending on n and such 

that ^ "behaves pointwise" like the inverse of Tn,K- This inverse operator, which is not always the 
Moore-Penrose pseudo inverse, will be called the regularized inverse of r„^x- Several procedures could be 
carried out. 

• Truncated spectral regularization : here this method matches the usual "Moore-Penrose" pseudo 
inversion hence Tn,K^n k Pj, K^n,K are both projection operators on H. In fact if the spectral 
decomposition of P„.k is P„,i<- = X^ta A*i,n ("i,n ® Ui,n) where for all i {^i^n,Ui,n) are the eigenval- 
ues/eigenvectors of Tn^K (we will always asssume that the positive fj-i^n^s are arranged in decreasing 
order) : 



P^ 

^ n,K 



1=1 

where N„ < 



{Ui,n U^^n) , (7) 



• Penalization : Now Pjj = (Pn.x + oinS) ^ where a„ is a (positive) sequence tending to zero and 
5* is a known operator chosen so that Tn.K + (XnS has a bounded inverse. Here S may be taken to 
be the identity operator. 



• Tikhonov regularization : It comes down here, since P„./f is symmetric, to taking : 

The sequence a„ is again positive and tends to zero. 

Several other methods exist. The reader is referred for instance to Tikhonov, Arsenin (1977), Groetsch 
(1993) or Engl, Hanke, Neubauer (2000). 

Remark 5 In all situations it should be noted that : 



sup 

n 

sup 



< 



< 



All these regularizing methods may also be applied to Tk o,s well and lead to P^ and this operator depends 
on n even if this index does not explicitely appear. One may then prove that for all x in the domain of 
^ K ' ^ ~* ^ K ^ when n goes to infinity. In addition to the boundedness, the operator P^ is always 
selfadjoint and positive. 

We are now in a position to propose an estimate for m{xo). This estimate will depend on the 
chosen regularization technique applied to P„,i<-. We will see that, under suitable conditions on Pj^ ^ the 
convergence of our estimate does not depend on the choice of Pj^ ^. 

Proposition 6 The local linear estimate of m (xq) is the real solution fh„ (xq) of (0) •' 

En 
i=l yi^i,n /„N 
■llln IX-oj = ^^^TT , (8) 



where 



\Xi - Xq\ 



i^z,n = K \^ ""' — j (^1 - (^Xi - Xo,rlj^ZK,, 

and 



n 



=1 



ZK^n K[ " ' (X, - Xo) 
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ihe proof oi the this Proposition is omitted since it stems irom calculations similar to those carried 
out in the proof of Proposition H) 

It is easy to check that (|8]) is the empirical counterpart of We finally see that m„ (xq) may be 
viewed as a linear combination of the outputs yi, ...,?/„ and may be expressed from a^^ just by replacing 
expectations by sums. The reader may also compare our estimate with its one-dimensional counterpart 
(display 2.2 p. 198 in Fan (1993)) and will also notice that the nice properties of the w^.n's in this setting 
do not hold anymore (see display 2.5 p. 198 in Fan (1993) and the lines below). 

The next section is devoted to developing the framework as well as the assumptions needed to get our 
central result. 

2 Assumptions and framework 

In all the sequel we assume : 

Ai : The kernel K is one-sided, defined on [0, 1], hounded and K {1) > 0. Besides K' is also defined 
on [0, 1] , is non-null and belongs to ([0, 1]). 

We did not try to find minimal conditions on the kernel. However the assumption K (1) > is rather 
rarely required in the non-parametric literature -to the authors' knowledge- and is essential here. 

2.1 The small ball problem and the class Gamma 

Consider the one-dimensional version of our model ^ and take X e M with density /. Fan (1993) studied 
the minimax properties of the local linear estimate in this setting and gave the Mean Square Error (see 
Theorem 2 p. 199). This MSE depends on f (xq). Here appears the second major problem. When the 
data belong to an infinite-dimensional space, their density does not exist, in the sense that Lebesgue's 
measure -or any universal reference measure with similar properties- does not exist. Consequently we 
must expect serious troubles since it is plain that the density of the functional input X cannot be defined 
as easily as if it real or even multivariate. Once again this problem will not be managed by just letting 
the dimension tend to infinity and we should find a way to overcome this major concern. 

It turns out that in many computations of expectations the problem mentioned above may be shifted 
to what is known in probability theory as small ball problems. Roughly speaking, if 93 is a real valued 
function (we set xq = for simplicity), E (i^ (||X||) ii' (||X|| //i)) may be expressed by means of P (|1X|| < h) 
and if only. We refer to Lemma in the proof section for an immediate illustration. Instead of knowing 
and estimating a density we must now focus on P(||X|j < h) for small h and everyone may understand 
why this function is often referred to as the "small ball probability associated with X" . We propose such 
references as Li, Linde (1993), Kuelbs, Li, Linde (1994), Li, Linde (1999) as well as the monograph by 
Li, Shao (2001) which provides an interesting state of the art in this area. 

What can be said about the function P(||X|| <h) 1 Obviously, by Glivenko-Cantelli's theorem it 
will be easily estimated from the sample (the rate of convergence is non parametric). Besides it is 
not hard to see that, under suitable but mild assumptions, if X G R'' with density / : R^* M+, 
P(||X — xoll < h) ^ hPf{xQ). But this fact leaves unsolved the question : what can be said when 
p +00 ? 

In probability theory most of the small ball considerations focused on the case where X is the brownian 
motion, the brownian bridge or some known relatives. Several norms were investigated as well. Most of 
the theorems collected in the literature yield : 

P(||X|1 </»)xCi/i"exp(^-g) (9) 

where a,/3, Ci and C2 are positive constants. The symbol x is sometimes replaced by the more precise 
~ . Another serious problem comes from the fact that the C°° function on the right in the display above 
has its derivates null at zero at all orders. Other results assess that, when xq belongs to the Reproducing 
Kernel Hilbert Space of X, 

P(||A-a;o|| < h) ^ C^J {\\X\\ < h) 

where Cxq does not depend on h but on and on the distribution of X. Two majors contributions will 
be found in Meyer- Wolf , Zeitouni (1993) and in Dembo, Meyer- Wolf , Zeitouni (1995). The authors give 
the exact asymptotic of P (||A||;2 < h) when A is a ^2-valued gaussian random element (by means of large 
deviation theory) : 

A = (flia:^!, 022:2,....) (10) 
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with Xi independent, A* (0, Ij-distnbuted and Z^at < +00. When at ^ i (r > 1/2) they obtain a 
formula similar to Q. Recently Mas (2007b) derived the estimate when = exp (— ci) , c > and got : 

P {\\X\\ <h)^C, [log {l/h)]-''^ exp (-C2 [log {h)f) . (11) 

A very strange fact is that both functions in ^ and (fTTj) belong to a class of functions known in the 
theory of regular variations : the class Gamma introduced and studied by de Haan (1971) and (1974). 
This class arises in the theory of extreme values and is closely related to the domain of attraction of the 
double exponential distribution. It was initially introduced by de Haan as a "Form of Regular Variation" . 
We provide now the definition of the class Gamma at 0, denoted Fq. 

Definition 7 A Junction f belongs to de Haan's class Fq with auxiliary function p if f maps a positive 
neighborhood of onto a positive neighborhood of 0, f (0) = 0, f is non decreasing and for all x G M, and 
p (0) = with : 

f(s + xp{s)) 

lim 771 = exp x) (12) 

40 /(s) 

In a recent manuscript, Mas (2008) proved that, in the framework of Dembo, Meyer- Wolf , Zeitouni 
(1995), the small ball probability of any random element that may be defined like display (fTU|) belongs 
to the class Gamma. A work is in progress to prove that, under suitable assumptions on the auxiliary 
function, the reciprocal also holds. The auxiliary functions appearing in displays ^ and (jlip may be 
easily computed. Mas (2008) proved that p depends only on the sequence a (•) that defines X in (|10p . 

The next Proposition illustrates the Definition above and will be useful in the section devoted to the 
main results. 

In all the sequel and especially within the proof section, C denotes a constant (which will vary from 
a theorem to another). 

Proposition 8 When the small ball probability is defined by the right hand side of the function p is 

p{s)^Cs^+^ (13) 

with /3 > 0, and when the small ball probability is defined by the right hand side of the function p 

is : 

Starting from all these considerations it seems reasonable to assume the following : 
A2 .• Let 

F{h)^F,, {h)^P{\\X-x4 <h) 

be the shifted small ball probability of X . We assume that F Cz Tq with auxiliary function p. 

Gamma varying functions feature original properties and we give now one of them which will be useful 
later in the proof section. We refer to Proposition 3.10.3 and Lemma 3.10.1 p. 175 in Bingham, Goldie, 
Teugels (1987). 

Proposition 9 If F G Tq with auxiliary function p then for all x € [0, 1[ , 

F(hx) , , 

lim — T-T = 15 

lim ^ = (16) 

h^O h 

Assumption A2 is central to tackle our problem since the mean square error, computed from our 
estimate actually depends on p. But additional assumptions should hold, especially on the distributions 
of the margins of X. 
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z.z Assumptions on tne marginal distributions 

The next assumption essentially aims at simplifiying the technique of proof but could certainly be alle- 
viated at the expense of more tedious calculations (see also Mas (2007b) and comments therein). 

A3 : There exists a basis (ei)j<^<;„ such that the margins ((-'^, ej))^<j<^ are independent real random 
variables. 

In all the sequel, fi = fi^xa stands for the density of the real-valued random variable {X — XQ,ei) . 
The behavior around of the shifted density fi is crucial, like in the finite dimensional settting. It has 
to be smooth in a sense that is going to be made more clear now. Note that /, (0) is nothing but the 
density of the non-shifted random variable (X, e^) evaluated at {xo,ei). 

Let Vo be a fixed neighborhood of 0, set 



ai = sup 
ueVo 



ft (u) - ft i-u) 



u{fi{u) + fU-u)) 



and assume that : 



-I- 00 



A4 : ^ of < +00. 

This assumption is close to those required in Mas (2007b). The next Proposition illustrates assumption 

A4 in the important case when X is gaussian. 

Example 10 Let X be a centered gaussian random element in H with Karhunen-Loeve expansion : 

+00 

k=l 

Here the Xk 's are the eigenvalues of the covariance operator of X, E (X (g) X) , the Ck 's are the associated 
eigenvectors and the r/k's are real-valued random variables N (0,1) -distributed. It is a well-known fact 
that {X, Ck) = VXkVk are independent real gaussian random variables and A3 holds. Then fi (u) = 



5^7 exp 



{u-{xo,ei)) 
2Xi 



and 



\ft ju) - ft{-u)\ (a;o,ei) 
sup — r~T; — 7 — ^ — } rr -- v ~ 



ueV(,u\f,{u) + fi{-u)\ Xi 

whenever {xq, Ci) /A, when i tends to infinity and A4 holds if : 

+00 I ,2 
{xo,ei) 



1=1 ' 



< +00 (17) 



Example 11 We can also consider the family of densities indexed by the integer m : 

Cm 1 



fi {u) = 



Xi ^ _^ ( u-{xo,e 



( u-{xo,ei) \ ' 
\ VTi J 



where Cm is a normalizing constant. We find : 

XT 

and assumption A4 holds whenever the sequence { 1^^?^'^^ ) € Z2 • 

Since the rate of decrease of the Ai's is intimately related to the smoothness of the random function 
X, we may easily infer that A4 should be interpreted as a smoothness condition on the function Xq. In 
other words, the coordinates of xq in the basis Ci should tend to zero at a rate which is significantly 
quicker than the eigenvalues of the covariance operator of X and hence that xq should be sufficiently 
smoother than X. 

It should also be noted that, when the family of densities fi is not uniformly smooth enough in a 
neighborhood of 0, Assumption A4 may fail. For instance, it is not hard to see that the aj's are not even 
finite when fi is the density of a shifted Laplace random variable : 

fi (w) = 77^ exp ' 



2Aj y Aj 
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Remark 12 Ihe issue oj the expectation of trie junctional input X should be raised now. We assumed 
sooner that the Xi 's are centered. But in practical situations we can expect /i = E (X) to be a non-null 
function. Then considering a new shift Xq — ii instead of xq solves the problem. So we can always consider 
the centered version of X but we must take into account that any assumption made on xq should be valid 
for xq — fi. For instance J77y should be replaced by 



+00 

E 



{xq - H.CiY 



< +00. 



2.3 Smoothness of the regression function 

In order to achieve our estimating procedure we cannot avoid to assume that the function m is regular. 
Since m is a mapping from H to M, its first order derivative is an element of C {H, M) , the space of bounded 
linear functionals from _ff to R which is nothing than H* ~ H. As announced sooner m' (xq) € H. 
The second order derivative belongs to C {C {H, R) , M) ~ £ (H x H, R) and is consequently a quadratic 
functional on H x H and may be represented by a symmetric positive linear operator from H to H 
(the Hessian operator). We will sometimes use abusive notations such as {m" {xq) {u) ,v) below and 
throughout the proofs. 

A5 : The first order derivative of m at Xq m' (xq) is defined, non null and there exists a neighborhood 
V {xq) of Xq such that : 

sup \\m" {x)\\^ < +00. 

This last display may be rewritten : for all u in H and all x in a neighborhood of xq 

{m" {x)u,u) < C||u||^ 

Remark 13 Assumption A5 assesses in a way that "the second order derivative of m in a neighborhood 
of xq is bounded". 

2.4 Back to the regularized inverse 

We need for immediate purpose to define a sequence involved in the rate of convergence of our estimate. 
Definition 14 Let v (h) the positive sequence defined by : 

\X - Xq 



V — V {h) = 



E K 



\\x-x4p{\\x-x4) 



(18) 



It is plain that v tends to zero when h does. 

Since they will be used in the sequel we list now some results from Mas (2007b). They are collected 
in the next Proposition and consist in bounding thre norms of operators Tk and Tn^K 



Proposition 15 The following bound are valid 

\\TK\\^>Cv{h), 

\\T^,k-Tk\\^ = 0l2 [h' 



F{h) 



(19) 
(20) 



Besides Tk/v {h) may converge to a bounded operator, say S, that may be compact. 

Before giving the main results we have to get back to the regularized inverse of Tn^K- Indeed a bound 
on the norm of ^ may be derived. Under the assumption that h'^F^^^ {h) / {v}l'^v {h)) ^ we see 

that ||r„_;f > Cv (h) As a consequence of these facts we expect the norm F^^ ^ to diverge with rate at 
least 1/v (h) since : 



< C < 



rt 



< 



.K 



If the operator S mentioned in the Proposition above is compact, we may even be aware that the norm 
of (/i) rJi will tend to infinity since is unbounded whenever exists. All this leads us to 
considering the next and last assumption on Pj^ ^ : 
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Ag : 1 here exists a sequence r„ J, such that 



max ■ 



K 



n,K 



(h) 



Here the parameter r„ just depends on the chosen regularizing method (penahzation, Tikhonov, etc.) 
and may be viewed as a tuning parameter. 

Remark 16 In fact as will be seen below the sequence rn may no tend to zero. But the situation when 
r-n i ^ is the most unfavorable one and we intend to investigate it with care. However rnV (h) always 
tend to zero and cannot be bounded below because of il9\) and \20\) . Besides ifTK/v{h) converges to an 
operator with bounded inverse, the sequence rn can always be chosen constant. 

Let us take some examples to illustrate the role of r„. We keep the notations of display ([7]) and of 
the lines below. 

• Truncated spectral regularization : remind that 

i=l 

where (^i^n,Ui^n) are the eigenelements of Tk and ||ri<-||j^ — supj {/ii^„} — (as announced 
sooner the eigenvalues are positive and arranged in a decreasing order). Hence 



K 



^1 .M^j " 



then r„ — /i7v„,n//^i,n i is the inverse of the conditioning index of operator fJ^. 
Penalization : Now ^\ = {^n,K + Q^n-^) ^ with 



and we can take rn — oinj [i\.n- It is possible here to get r„ "f +cx) by an accurate choice of a„ and 
some information on 



• Tikhonov regularization : Here = + Q;„/) ^ Vk and 



1 K,n + "« 



{Ui^n ® Ui^n) ■ 



A choice for r„ is here a„//x^ „ and the same remark as above holds. 



3 Statement of the results 

The central result of this article is a bound on the Mean Square Error for the local linear estimate of the 
pointwise evaluation of the regression function at a fixed design. In the sequel the generic notation C 
stands for universal constants. 

Theorem 17 Fix xq in H. When assumptions Ai— Ag hold and if nF (h) — > +oo .■ 



E (m„ (xo) - m (xo)) <C 



/l6 



C 



1 



jh) 

nF (h) ^ F^ {h) 
V (h) 



nF{h) V nrnv{h) r^F (h) ^ 
where the first line arises from the bias of our estimate and the second stems from its variance. 

Remark 18 If K is chosen to be the naive kernel, K (s) — ll[o,i] (s) , assumption Ai can be removed 
and the previous theorem remains valid. 
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Remark 19 It turns out that the variance term is decomposed into three. I he first is [nr (h)) and 
is classical (see Ferraty, Mas, Vieu (2007)). The two others stem directly from the underlying inverse 
problem and the sequence r„ appears. 

Note that we did not fix the issue of the sequence t*.^ involved m tlie regulcirizing inverses and 
K- Theorem 1171 may be simplified under mild additional assumptions. 

Proposition 20 Taking rn ^ h then 



E(m„ (xa) - m{xa)f 
<Ch^ ^ ^ 



nF (h) \ nv (h) 

This Proposition is derived from Theorem [T7] and Lemma 1291 

Remark 21 Turning back to Proposition\S\ and considering displays ilS\) and ^14\ l it is not hard to see 
that both functions p are regularly varying at with index I + f3 for the first and 1 for the second and 
hence that Proposition holds. It should also be noted that from property jib]) in Proposition [P| that we 
can truly expect p to be of index larger than 1 whenever it is regularly varying at 0. This fact motivates 
the next Proposition. 

Proposition 22 Under the assumptions of Theorem \ 1 7| and of Provosition [WA if the auxiliary function 
p is regularly varying at with index g > 1, 

v{h)^hp{h)F{h). (21) 

Then if p (s) > Cs'* in a neighborhood of 0, the mean square error becomes : 



E (to„ {xa) - m {xa) f < C 



1 



nF{h)^ 

and the rate of decrease of the Mean Square Error depends on h* given by 



(h*fF(h*)^-. (22) 
n 



If p (s) /s^ — > when s — )■ the above rate is damaged. For instance taking r„ x h the MSE becomes : 

E (m„ (^o) - m {xo)f < C (^h' + . 



e 



Remark 23 Display \21\ was proved in Mas (2007b). In the first case (when p{s) > Cs ), since th 
bias term is here an O (/i^) , the rate of convergence of our estimate outperfoms the one computed in 
Ferraty, Mas, Vieu (2007). The estimate was a classical Nadaray a- Watson kernel estimator whose bias 
was an O (/i^) . Obviously the rate of convergence in the second case is damaged but even for very irregular 
processes such as Brownian motion or Brownian Bridge function p (s) is above or depending on 
the norms that are used. The interested reader is referred for instance to displays (20) and (22) in 
Mayer-Wolf, Zeitouni (1993) or Proposition 6.1 p. 568 in Li, Shao (2002) but will have to carry out some 
additional computations. It seems reasonable to think that this unfavorable situation will rarely occur in a 
usual statistical context (with functions reconstructed on "smooth spaces"). However we prove just below 
that, even when p decays rapidly to 0, it is always possible to choose a regularizing method for Tn_K that 
reaches the best rate of display 



Remark 24 It may be fruitful for practical purposes to comment on formula i22) . First we see that 
when X G R*^, F (h) ^ Ch'^ then the rate of convergence in mean square turns out to be which 
is the optimal rate of convergence for a twice- differentiable regression function (see Stone (1982)). When 
the small ball probability belongs to the class Tq, this rate will depend on p. We know that the term F (h) 
will always tend to quicker than h"^ and will consequently determine the choice of h. The situation is 
consequently more intricate than in the multivariate setting. However following the example of displays 
(0) and fill]) we get repectively 

where /3 < 3 when p{s) > Cs'^. Finally the rate of decrease of the mean square error is a O ^(logn) 
where c > 1. 
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ihe last Proposition is devoted to dealing with the situation described along Remark 1161 : when r„ 
does not tend to zero. This cannot happen when the regularizing method is the spectral truncation but 
may occur when either a penalization or a Tikhonov method are applied. We remind that we cannot 
avoid the condition VnV {h) | 0. We start from Theorem [T71 

Proposition 25 When assumptions Ai— Ag hold, if nF [h) +oo, when the regularizing method allows 
to do so, taking r (h) — \/p{h) provides : 



Obviously rnV (h) tends to 0. If the chosen method is penalization such that T]^ ^ = {Tn^x + cinS) 
it suffices to take a„ = h*F {h*) to achieve our goal. The proof of this Proposition is easy hence omitted. 

Remark 26 The rate obtained at display \ 213(1 issued from Proposition \2Si should be compared with the 
minimax rate obtained by Fan (1993) for scalar inputs. The MSB was then Ch^ + C / (nh) . We see 
that, replacing F (h) by h (which is logic if we consider the remark about the multivariate case just below 
display in the section devoted to the small ball problems), both formulas match. This fact leads us to 
another interesting issue : does this rate inherit the optimal (minimax) properties found by Fan in his 
article ? This question goes beyond the scope of this article. Besides not much has been done until now 
about optimal estimation for functional data -to the authors' knowledge. But there is no doubt that this 
issue will be addressed in the next future. 

4 Conclusion 

Obviously this article could be the starting point for other issues such as almost sure or weak convergence 
of the estimate. Almost all practical aspects were left out on purpose : they will certainly give birth to 
another article. However the main goal of this essentially theoretic work was to underline the rather large 
scope of our study. We had to seek several ideas in such various areas as probability theory, functional 
analysis, statistical theory of extremes, inverse problems theory. Finally it turns out that it is possible to 
get, in the functional setting, almost the same rate of decay for the bias as in the case of scalar inputs. 
The variance involves the small ball probability evaluated at h, the selected bandwidth. A drawback 
arises with the necessity to introduce a new parameter : the regularizing sequence r„ , which depends on 
the sample size (more precisely on the bandwidth h). We give no clue to find out in practical situations 
the bandwidth h but we guess that the ever wider literature on functional data will quickly overcome 
this problem by adapting classical methods such as cross-validation for instance. 

Another major practical concern relies in the estimation of the unknown auxiliary function p. Several 
tracks already appear to address this issue. One may think of adapting some techniques from extreme 
theory. After all p characterizes the extreme behaviour of ||A|| like tail indices for WeibuU or Pareto 
distributions. The only difference stems from the fact that p is a function and not just a real number. 
The other idea lies in the article by Mas (2008) where the auxiliary function p is explicitely linked with 
the eigenvalues of the ordinary covariance operator of X. From the estimation of these eigenvalues (which 
is a basic procedure) it should be possible to propose a consistent estimation of the auxiliary function as 
a by-product. 

5 Proofs 

For the sake of clarity we begin with an outline of the proofs. The following bias-variance decomposition 
for fhn (xo) — rn (xq) holds : 



E (m„ (xo) - m {xQ)f < Ch^ + C 



1 



nF (h) ' 



m„ {xo) 



m {xo) 



En 

En 
i=l ^hn 



m {xo) 



ELi (vi 



m (xo)) uji, 



En 



ELl iVi 



m{Xi)) uji^. 



+ 



m{xo))uj, 



En 
»=i'^^ 



En 
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We denote : 



Tb,n — 

T — 



En 

YJi=l {y^ - m{Xi))uJi^n 

En 

En 
i—\ ^i,n^i 



(23) 
(24) 



En 

where e was defined at display ([1]). Here T^^n is a bias term and Tu^„ is a variance term. Finally we get 



IE [m„ (xo) - m ixo)f = ^T^,n + + 2E (r6,„r„,„) (25) 



and since 



E (Tb,„r,,„) - E (Tfc.„E (T,,„|Xi, X„)) 
= 

computing the mean square error of m„ {xq) comes down to computing ^T^^ and Er^„ which will be 
done later. 

The proof section is tiled into two subsections. The first one is devoted to giving preliminary results 
as well as Lemmas. In the second the main results are derived. 



5.1 Preliminary results 

We assume that assumptions Ai— Ag hold once and for all. The next two Lemmas are given for further 
purposes. Their proofs are omitted. The interested reader will find them in Mas (2007b). 

Lemma 27 // / belongs to the class Tq with auxiliary function p, then for all p G N, 



tP 



„ ^/(.^)^,„^'^r(^i±i)/, 



For any x = ^ XkCk in H and for j S N set ||a;||^j = Sfc^^i ^fc- 

We denote f^i the density of \\X — xo||_^j. We need to compute both densities /||x-2;o|l (density of 
\X - xoll) and f{x-xa,e,),\\x-xa\\ (density of the couple [{X - xo.Ci) , \\X - a;o||)). 



Lemma 28 We have : 

— XQ^Ci) ,11 X — II 



{u) f^, (V-y^ a{^>|«|}, 
/||x-.„|| (v) = V ^=U^ {vVT^) dt. 



(26) 
(27) 



Besides if f\\x-xo\\ '^''^d f^i are T-varying for all i then they have all p as auxiliary function. 

We begin with more specific computational Lemmas. 

Lemma 29 Let ip be a positive real valued function, bounded on [0, 1] and regularly varying at with 
index g > I and let p (z N : 



ERP 



X ~ Xq 



^iWX^xoW) ^^KP{l)^{h)F{h). 



(28) 



As important special cases we mention 

X -Xo 



E 



EK 

\X-xo\rK 



h 

X - Xo 



K{l)F{h), EK^ 
K{l)F{h) h"". 



X - Xq 



K^l)F{h), 
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Proor : 

We prove ([251) whenp = 1 and denote fW^i-^aW/h ^-^q distribution of the random variable \\Xi — xo\\ /h. 
Since 

X - 



^ ip i\\X ^ xqW) = i^(u)^(/iii)dpll^'-^" 



and from K (u) ip (hu) = K (1) (p (h) — [K (s) ip (hs)]' ds we get : 
X -xo 



EK 



p{\\X-xo\\) 



[ii:(s)(^(;is)]'dpii^--^°ii/'' {u) 



0<it<s<l 



Applying Fubini's Theorem we get 

X - xq 



EK 



^ p {\\X - xoW) = K{\)ip (h) F(h)- j [K (s) p} (/is)]' F {hs) ds 
= K{\)p{h)F{h){\-nh) 



ith 



1 T^l 



K' (s) p {hs) + K (s) V {hs) F {hs) 



ds 



p{h) F{h) 

Since F is gamma- varying at 0, display psp in Proposition [5] tells us that F {hs) / F {h) ^ as ft, 0. 
As p is regularly varying at with, say, index g >1, p {hs) jp {h) s^ as h goes to 0. Remind also that 
K' is integrable. We deal with 

^ p'{hs) _^ v'{h) p'{hs) 
p{h) p{h) p'{h) 

Now in Bingham, Goldie and Teugels (1987), the definition of regular variation is given p. 18. From 
Theorem 1.7.2b p. 39 we deduce that p' is regularly varying with index g — 1 > hence that : 

h^O p' {h) 

uniformly with respect to s e ]0, 1] and by the direct part of Karamata's Theorem p. 28 (take g = p — 1, 
a = and / = p') that : 

limsupn. — > g 

h^a V {h) 

which means that hp'{hs)/p{h) converges pointwise to gs^"^ (which is integrable with respect to 
Lebesgue's measure). Then we can apply Lebesgue's dominated convergence theorem and Proposition [9] 
(see display ([T5t ) to get TZh ^ as ft. ^ 0. This last step leads to the announced result. 

For the sake of shortness we will sometimes set : 

Z = X - XQ,K ^ K {\\X ~ x4 /h) 

and : 

I 71 1 ^ 

ZK,n = - V Z.i^. = - V {X, - Xo) K {\\X, - xoW Ih) . 
n ^ — ' n ^ — ' 

i=l k=l 

The next lemma is a crucial. 
Lemma 30 We have : 

X - Xo 



'■[ZK]\\ = 



E 



K 



{X - xo) 



< Cv^ (ft) . 
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Remark 31 We can measure the sharpness of the previous bound. Indeed a very simple inequality would 
give by Lemma ; 

||E[ZX]||^ < (E||ZA'||)^ = {K\\Z\\Kf - Ch^F^ (h) 
whereas in view of il8\) and -when p is regularly varying at with positive index- of Lemma \29l 

E(X( ^^M^ \\X-xo\\p{\\X-xo\\)]] <h^p^h)F^h). 



So the bound was improved by a rate of {h) — a (/i^) . 
Proof : 

Computations here are quite similar but however distinct from those carried out in Mas (2007b). 
We start with projecting E [K (|| ||) ~ ^<i)\ on the basis (ei)jgpj mentioned in A3 and compute 
thanks to Lemma [2H1 : 



E 



K 



ft iu) f^i [^Jv"^ - u^j du \ dv. 



\J v"^ — V? 



(29) 



Now we deal with 



— u 



Vw^ 



ih {U) - /, {-U)) f^t (^^V^ - du 



hence 



< sup 

0<u<v<h 



< OLi 



ffi (u) Ui (^Vv^ - w^) du 



1 f^ (U) - /. i-u) 



u ft [u) + fi (-u) 



Vv^ 



(/.H + /.(-"))/#.(v^ 



2 rfu 



-1, y/v^ — u- 



}fi (u) f^t iyv"^ - u"^^ du. 



As a consequence of the preceding lines we get 

■||^-a;o|| 



K 



h 



{X - xo,ei) 



< a,E 



\\X -xo\\ \ 2 
K I : I (A - xo,ei} 



leading to 



K ( . 1 (A - xo) 



2 +00 



< sup E 



< Cw(/i)' 



A 



||A - .Toll N 2 

A ( ) (A - .To, Cj) 

11^ - 2;o|| ^ , ,g 
(A - xo,ei) 



2 +00 



Lemma 32 Both following bounds hold : 



E II^K.n — EZx.nll <C 

II . 11 ^ 

E „ — EZif nil <C J^-^- 
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Prooi : We may see ZK,n — ^ZK,n as an array or n independent centered random element with 
values in a Hilbert space. Denote : 

U = ZK (ZK) . 

Then Z x.n — '^Z K,n — (I/"-) X]fc=i We limit ourselves to proving the second display in the Lemma, 
which is the most technical. It is a slightly painful but however quite simple calculation to get : 



< C 



^ "llt^if + A (lE||C^iir) +^E([/i,C/2)([/i,C/2) 



EllC/i 



where the last line stems from the first by Cauchy-Schwarz inequality. We do not want to go too deeply 
into steps that may be easily deduced and we hope the reader will agree that, due to the denominator 
the first term on the right in the display above may be neglected with respect to the second (even if 



(E||f/i||^) < E||C/i||''). We turn to : 

E||J7||^ = E 

It follows from Lemma [23] and Lemma [301 that 



IZW^K^ 



IE [ZK] 



\E\ZK]\\^ =o(E 



IZW^K^ 



hence that 



ZW^K^ 



E\\U\r -E 

- Ch^F{h) 



which finishes the proof of Lemma [ 
Lemma 33 We have : 



Ec.2 <C 



h^Fjh) v{h) 
nrnV (h) Tn 



Proof : Developping „ we get 



K 



^n.K^K.n,KiZ^ 



.2 /pt 

'1 \^ n.K^"."' ^1 



<2Kl + 2lv\j^ZK,-a,K^Z^ 



We deal essentially with the second term since by Lemma [^ we know that EK\ = O {F (h)) . We have : 

ri j^K.n.K^zS <ceIv\ j,ZK..n: ^Ik[z ^ ' 



where C is here nothing but sup^ 



for all i we also have : 



^ K (s) . Since the expectation in the above display we bay rewritten 



E ( rl r.ZK - ' ^ 



'^1^0 < -Y.^{^l,KZK,n,^^Z, 
1 " 

T:^^"- {^li,K^K,n,Zi) (^lJ^ZK,n,Z^ 



= CE 
CE 



{{^n,K (l 



since for all u in H 



1 " 

-y^K^ {u, Z^ {u, Zi) = (r„,/f iu) , u) 



1=1 
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At last 



We set Sn = ri j,Tn, 

K^n K i^ri IS a positive symmetic operator) and notice that 



^li K^K,n, KiZ^ ) < CE ( (r|^ K^7i,K^n_KZK,7i, Z K,n 



\Sn\L<C 



< 



c 



TnV (h) 



because sup„ 



< +00. 



Our last inequality becomes : 



= CI 

<C\E 
C 

< 



SI/'^Zkji 



-E 



Sl'^ (EZ 



K.: 



-E II {ZK,n - ^Zk.u) f + P 



We invoke Lemma [32] and Lemma |30] to bound both terms in the preceding display. At last we get 

2 /h^F{h) v{h)' 



^{^i,KZK,n,KiZ^j <C 

which yields the desired result. 
Lemma 34 When nF (h) — > +00, 



nrnV (h) r„ 



nK{l)F{h) 



where —> denotes convergence in mean square. 
Proof : 

nK{l)F{h) nK{\)F{h) K (1) F (h) 



IK,, 



- 1. 



By Lemma [21] the second term tends to zero. We deal with the first one. We note that : 

E {K, - EKif = EKf - {EK,f - (1) F (h) 
by Lemma [55] again. Straightforward computations give : 



nK(\)F{h) 



= 0t 



1 



hence the conclusion. 
Lemma 35 We have 



h'^F^ jh) jh) 



Proof : Since ^ is a positive operator, its square root exists and 



^\.K^ K,m Z K,n) — 



.K 



< c 



1/2 _ 

Zk^v, 

1/2 



Zr.u — EZk.1 



1/2 



EZk., 



16 



ihen 



rl 



< C 



< c 



nj< 
||4 



1/2 



K'. 



From Lemma [30] and Lemma [21] we get : 



nr'^v'^ (h) r^ti^ (ft,) 



5.2 Derivation of the main results 



We start with a short and simple intermezzo about optimization in Hilbert spaces. 
Proof of Proposition [4] : 
Consider the program : 



min E 



[y-a - ((p,X - XQ)f K 



\X-xo\ 
h 



Simple computations lead to 
£{a,ip) = E 



= C + a^EK + {TkV, v) - 2aE {yK) - 2 (E [yZK] , (^) + 2a (E {ZK) , Lp) . 
Obviously £ (a, Lp) is positive strictly convex and 

lim £ (a, ip) = +oo 

o-,\W\\^+^ 

hence £ (a, if) has a single minimum (see Rockafellar (1996) for further information about the minimiza- 
tion of convex functions). It is also differentiable for all {a,if) in K x H. We compute its gradient : 



V£ {a,p) 



2aEK - 2E (yK) + 2 (E (ZK) , (p) 
2TkP - 2E (yZK) + 2aE (ZK) 



from which we get the solutions (a* , ) : 

a*EK + (E (ZK) E (yK) 

Tk'P* = E (yZK) - a*E (ZK) 

We see from the second line that ip* is not uniquely defined if Tk is not one to one. Taking ip* = 
F^^ (E (yZK) — a*E (ZK)) we get m„ (xq) as announced. 

The forthcoming Lemma assesses that the random denominator of our estimate may be replaced by 
a non-random one. 

Lemma 36 When both ^^ii^a^/,') o,nd ^ tend to zero, the following holds : 

En 



nK{\)F(H) 



T ^ 



Proof 



hence 



1=1 



nK{l)F{h) nK{l)F{h) F (h) 

From Lemmas 1341 and 1351 we deduce that the announced Lemma [551 holds . 
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5.2.1 Variance term 

We study first (see [21 : T^,n = i.yi-m(Xi))uJi^r, ^ . it is plain that ET„,„ ^ 0. 

Denote f„.„ = .jTjf ■ We have : 

We begin with a Proposition. By Lemma [36] just above we know that T^^n ~ T^,n in sense i.e. 



Proposition 37 M^e have 



nF^ (h) \ nrnV (h) r„ 



Proof : As announced above it suffices to prove the Proposition for T„ 



ET2 = E 



\nK{l)F{h) 
1 



n^K^{l)F^ {h) 



E < E 



E 



since for i 7^ j E [(eja;i^„eja;j^„) X„] = Wj^„Wj_„E [(ei£j) X„] = 0. Hence 



ET^ = — 



(/.)■ 



By Lemma [ 



^ '"^ V nrnv{h) 



from which we deduce the Proposition. 
Now wc turn to the bias term. 

5.2.2 Bias term 

Remember that we have to deal with : 



En 
1=1 ^i,n 

Copying what was done above with T^^n, we know that we can focus on 

nK{l)F{h) 

via Lemma 1361 For each i there exists Ci € B (xq , h) such that : 

TO (X,) - m {xq) 

= (to' (xo) . + ^ (w" (c.) (^^) , Z,) . 



18 



with Zi = Xi — xq. We deal with the hrst and second order derivatives separatedly : lb,n ~ J-b.n,i + J6.n,2 
with 



^b,nS — - 



1 ^i,n 



nK{l)F{h) 
'''"'^ 2 nK{l)F{h) 

Proposition 38 We have : 



Proof of the Proposition 

We first see that : 



{m' (xo) , Xi - xo) ijJi,n = ^ {m' (xq) , Zi) Ki [l - {Zi, tIj^Zk^^ 

i=l 1=1 

n 

= ^ (m' (xo) , Zi) 
1=1 

n 

- ^ (m' (xo) , Zi) Ki (^Zi,T\^^ZK,n^ 
=1 

: {m' (xo) , ZK,n) - n (Tn,Km' (xq) , rJ^^^Z^^n^ 



i=l 

n < 



and 



Tb,n,l — 



= n (m' (xo) , {^I - r„,i<-rjj_^j ZK,n^ 

m' (xo) Al - ^n,K^\^K ) (^^.» 



Then we split into two terms : 



' I -T,,^KTij,) m' (xo),EZ;^, 



The norm of the first is bounded by Ch^J F [h) jn (see Lemma [32|) and the I? norm of the second is 
bounded by Cv (K) (see Lemma [501) . This finishes the proof of Proposition [551 
We turn to Tb^n,i and cut it into two parts : 

^ ie:Li(™"(c.)(^.),^.>^z 



2 nK{\)F[h) 

1 YTi=\ ('^" (Ci) (^■j) , Zi) Ki (Z^, Vl j^ZK.n 

~ 2 nK{l)F{h) 

= Rbnl + Rbn2- 

The two forthcoming Propositions aim at giving a bound for the mean square norm of i?b„i and Rbn2- 
Proposition 39 We get 

F. R? . < r ( 

nF{h) 



Ei?Li < C f-^ + 



Proof of the Proposition : 

It is plain to see that for all i and when Assumption A5 holds 



0<{m''{c,){Z,),Z,)K,<[ sup ||m"(x)||^ 1 llZj'if, 

\xev(x„) I 
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hence that 



It follows that 



< Rbnl < 



< < c 



2 nK{l)F{h) 

(Er=i 



n-^F^ (h) 



Then 



iK(Kf\\Z,f)+^ J2 E(K,\\Z,fK,\\Z,\ 



< 



< 



C 
F2 [h) 

C 

C 
FHh) 



-^Kh\Z,,t 



l<i^j<n 
2' 



EK,\Z,, 



h^F (h) 



+ h^F^ (h) 



C 



/l4 



nF{h) 



We turn to Rbn2- 
Proposition 40 We have : 



Dealing with Rbn2 is a bit more complicated. We have 



-2i?b„2 — 



1 



^i^KZK,n. - (™" (^») ' ) • 



The next operation consists in replacing Z K,n by its expectation. Like above in the proof of Proposition 
[38l as well as in the proof of Lemmas [33] and [35] we can add and subtract ¥.ZK from Z^.n • Once again 
we decide not to go through details here for the sake of shortness and clarity. Finally since the remaining 
involving ZK,n — ^ZK,n tends to zero quicker in mean square, we can focus on : 



n,K 



\\^KZ\\ 



1 " 

- V(m" {c,){Z,),Z,)K,Z^ 



(30) 



At last we have to deal with 



E 



1 " 

- ic,)iZ,),Z,)K,Z, 

nrt f 



i=l 



Easy computations give : 



1 " 

- V(m" {ci){Z,),Z,)K,Zi 
n ^-^ 

i—l 

n 



i=l 



■ (to" (c) {Zi) , Z,) (to" (c,) (Z,) , Zj> {K,Z^, KjZ, 



(31) 
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We take expectations now and apply assumption As to the hrst sum : 

n 

z — / 

1=1 

n 



-E ( K'f 

n 



< -h^F(h). 
n 



Since h^F (h) /n tends to zero at a rate much quicker than the next term we do not let it appear in the 
Proposition. 

We fix i and j in PT|) and take expectation : 

E (m" (q) Z„ Zi) (m" {c,) Z,, Z,) {K,Z„ KjZ,) 
= (E [(m" [a) Z„ Z,) K,Zr] , E [(m" (cj) Zj, Z,) K,Z,\) 



By assumption we get 



||E[(m" (c) Z> KZ\\\ 



|E (m" (c) Z„ Z,) (m" (c,) (Z^) , Z,) {K,Z,,K,Z,)\ 
< (E||(m" (c) Z, Z) KZ\\f 



< C 



E 



[K\\zf) 



< Ch^F^ (h) . 
Finally with (I30p at hand we have : 

C 



Ei?: 



fan 2 



< 



(h) v^r^ \ n 



h^F{h) + Ch^F^{h) 



< C- 



since nF (h) — > +00. 

At last we finish with the proof of the main Theorem which is considerably alleviated by all that was 
done above. 



Proof of Theorem I17L Proposition 1201 and Proposition [ 

The proof of the Theorem stems from display Propositions [571 [551 [5^ and HDl Collecting these 
previous results we have : 



E {fhn (xq) - m (xo)) <C 
+ C 



1 



nF^ (h) 



F{h) 



h^F{h) v{h) 



nrnV (h) r„ 



nF{h) F^{h)_ 
First from 

V (h) < h^F (h) , 

we see that the first line above will be an O (1/ {nF (h))) whenever /rn and h? / [nrnV (h)) are bounded. 
We turn to the second line. The term is at least /i^/ {nF (h)) may be removed because it can be neglected 
with repect to the variance term. In order to reach an O (/i^) for the bias we have to bound h'^/r^ and 
l/{h^nF{h)). 

At last summing up all what was done above comes down to taking r„ x h, and n-min {u (h) /h, h?F {h) } > 
C > 0. 

Following the results of Mas (2007b) this last inequality comes down, when p is regularly varying at 
with positive index : 

nF (h) ■ min {p {h) , /i^} > C > 

And Theorem [17] is proved. 
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